Asynchronous Complex Analytics in a Distributed Dataflow Architecture (Extended Abstract)

نویسندگان

  • Joseph E. Gonzalez
  • Peter Bailis
  • Michael I. Jordan
  • Michael J. Franklin
  • Joseph M. Hellerstein
  • Ali Ghodsi
  • Ion Stoica
چکیده

Scalable distributed dataflow systems have recently experienced widespread adoption, with commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines routinely supporting increasingly sophisticated analytics tasks (e.g., support vector machines, logistic regression, collaborative filtering). However, these systems’ synchronous (often Bulk Synchronous Parallel) dataflow execution model is at odds with an increasingly important trend in the machine learning community: the use of asynchrony via shared, mutable state (i.e., data races) in convex programming tasks, which has—in a single-node context— delivered noteworthy empirical performance gains and inspired new research into asynchronous algorithms. In this work, we attempt to bridge this gap by evaluating the use of lightweight, asynchronous state transfer within a commodity dataflow engine. Specifically, we investigate the use of asynchronous sideways information passing (ASIP) that presents single-stage parallel iterators with a Volcanolike intra-operator iterator that can be used for asynchronous information passing. We port two synchronous convex programming algorithms, stochastic gradient descent and the alternating direction method of multipliers (ADMM), to use ASIPs. We evaluate an implementation of ASIPs within on Apache Spark that exhibits considerable speedups as well as a rich set of performance trade-offs in the use of these asynchronous algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchronous Complex Analytics in a Distributed Dataflow Architecture

Scalable distributed dataflow systems have recently experienced widespread adoption, with commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines routinely supporting increasingly sophisticated analytics tasks (e.g., support vector machines, logistic regression, collaborative filtering). However, these systems’ synchronous (often Bulk Synchronous Parallel) dataflow e...

متن کامل

Functional Reactive Stream Processing for Data-centric Publish/Subscribe Systems

The Internet of Things (IoT) paradigm has given rise to a new class of applications wherein complex data analytics must be performed in real-time on large volumes of fast-moving, heterogeneous sensor-generated data. Such data streams are often unbounded and must be processed in a distributed and parallel manner to ensure timely processing and delivery to interested subscribers. Dataflow archite...

متن کامل

Industry Paper: Reactive Stream Processing for Data-centric Publish/Subscribe

The Internet of Things (IoT) paradigm has given rise to a new class of applications wherein complex data analytics must be performed in real-time on large volumes of fastmoving and heterogeneous sensor-generated data. Such data streams are often unbounded and must be processed in a distributed and parallel manner to ensure timely processing and delivery to interested subscribers. Dataflow archi...

متن کامل

Compositionality in dataflow synchronous languages : specification & distributed code generation ∗ † ‡ Albert

Modularity is advocated as a solution for the design of large systems, the mathematical translation of this concept is often that of compositionality. This paper is devoted to the issues of compositionality for modular code generation, in dataflow synchronous languages. As careless reuse of object code in new or evolving system designs fails to work, we first concentrate on what are the additio...

متن کامل

Decentralized and Cooperative Multi-Sensor Multi-Target Tracking With Asynchronous Bearing Measurements

Bearings only tracking is a challenging issue with many applications in military and commercial areas. In distributed multi-sensor multi-target bearings only tracking, sensors are far from each other, but are exchanging data using telecommunication equipment. In addition to the general benefits of distributed systems, this tracking system has another important advantage: if the sensors are suff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015